win probability
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (3 more...)
Assessing win strength in MLB win prediction models
In Major League Baseball, strategy and planning are major factors in determining the outcome of a game. Previous studies have aided this by building machine learning models for predicting the winning team of any given game. We extend this work by training a comprehensive set of machine learning models using a common dataset. In addition, we relate the win probabilities produced by these models to win strength as measured by score differential. In doing so we show that the most common machine learning models do indeed demonstrate a relationship between predicted win probability and the strength of the win. Finally, we analyze the results of using predicted win probabilities as a decision making mechanism on run-line betting. We demonstrate positive returns when utilizing appropriate betting strategies, and show that naive use of machine learning models for betting lead to significant loses.
- North America > United States > Nevada (0.04)
- North America > United States > Kansas > Douglas County > Lawrence (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)
Scaling Laws For Scalable Oversight
Engels, Joshua, Baek, David D., Kantamneni, Subhash, Tegmark, Max
Scalable oversight, the process by which weaker AI systems supervise stronger ones, has been proposed as a key strategy to control future superintelligent systems. However, it is still unclear how scalable oversight itself scales. To address this gap, we propose a framework that quantifies the probability of successful oversight as a function of the capabilities of the overseer and the system being overseen. Specifically, our framework models oversight as a game between capability-mismatched players; the players have oversight-specific Elo scores that are a piecewise-linear function of their general intelligence, with two plateaus corresponding to task incompetence and task saturation. We validate our framework with a modified version of the game Nim and then apply it to four oversight games: Mafia, Debate, Backdoor Code and Wargames. For each game, we find scaling laws that approximate how domain performance depends on general AI system capability. We then build on our findings in a theoretical study of Nested Scalable Oversight (NSO), a process in which trusted models oversee untrusted stronger models, which then become the trusted models in the next step. We identify conditions under which NSO succeeds and derive numerically (and in some cases analytically) the optimal number of oversight levels to maximize the probability of oversight success. We also apply our theory to our four oversight games, where we find that NSO success rates at a general Elo gap of 400 are 13.5% for Mafia, 51.7% for Debate, 10.0% for Backdoor Code, and 9.4% for Wargames; these rates decline further when overseeing stronger systems.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Leisure & Entertainment > Games (1.00)
- Government > Regional Government > North America Government > United States Government (0.67)
- Information Technology (0.67)
Efficient Portfolio Selection through Preference Aggregation with Quicksort and the Bradley--Terry Model
Ge, Yurun, Böttcher, Lucas, Chou, Tom, D'Orsogna, Maria R.
How to allocate limited resources to projects that will yield the greatest long-term benefits is a problem that often arises in decision-making under uncertainty. For example, organizations may need to evaluate and select innovation projects with risky returns. Similarly, when allocating resources to research projects, funding agencies are tasked with identifying the most promising proposals based on idiosyncratic criteria. Finally, in participatory budgeting, a local community may need to select a subset of public projects to fund. Regardless of context, agents must estimate the uncertain values of a potentially large number of projects. Developing parsimonious methods to compare these projects, and aggregating agent evaluations so that the overall benefit is maximized, are critical in assembling the best project portfolio. Unlike in standard sorting algorithms, evaluating projects on the basis of uncertain long-term benefits introduces additional complexities. We propose comparison rules based on Quicksort and the Bradley--Terry model, which connects rankings to pairwise "win" probabilities. In our model, each agent determines win probabilities of a pair of projects based on his or her specific evaluation of the projects' long-term benefit. The win probabilities are then appropriately aggregated and used to rank projects. Several of the methods we propose perform better than the two most effective aggregation methods currently available. Additionally, our methods can be combined with sampling techniques to significantly reduce the number of pairwise comparisons. We also discuss how the Bradley--Terry portfolio selection approach can be implemented in practice.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (11 more...)
- Government (0.68)
- Leisure & Entertainment > Games (0.46)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (3 more...)
Learning Concave Bid Shading Strategies in Online Auctions via Measure-valued Proximal Optimization
Nodozi, Iman, Gligorijevic, Djordje, Halder, Abhishek
This work proposes a bid shading strategy for first-price auctions as a measure-valued optimization problem. We consider a standard parametric form for bid shading and formulate the problem as convex optimization over the joint distribution of shading parameters. After each auction, the shading parameter distribution is adapted via a regularized Wasserstein-proximal update with a data-driven energy functional. This energy functional is conditional on the context, i.e., on publisher/user attributes such as domain, ad slot type, device, or location. The proposed algorithm encourages the bid distribution to place more weight on values with higher expected surplus, i.e., where the win probability and the value gap are both large. We show that the resulting measure-valued convex optimization problem admits a closed form solution. A numerical example illustrates the proposed method.
- Marketing (0.94)
- Information Technology > Services (0.83)
Towards Explaining Monte-Carlo Tree Search by Using Its Enhancements
Kowalski, Jakub, Winands, Mark H. M., Wiśniewski, Maksymilian, Reda, Stanisław, Wilbik, Anna
--Typically, research on Explainable Artificial Intelligence (XAI) focuses on black-box models within the context of a general policy in a known, specific domain. This paper advocates for the need for knowledge-agnostic explainability applied to the subfield of XAI called Explainable Search, which focuses on explaining the choices made by intelligent search techniques. It proposes Monte-Carlo Tree Search (MCTS) enhancements as a solution to obtaining additional data and providing higher-quality explanations while remaining knowledge-free, and analyzes the most popular enhancements in terms of the specific types of explainability they introduce. So far, no other research has considered the explainability of MCTS enhancements. We present a proof-of-concept that demonstrates the advantages of utilizing enhancements.
- Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
- Europe > Netherlands > Limburg > Maastricht (0.04)
DIAMOND: An LLM-Driven Agent for Context-Aware Baseball Highlight Summarization
Kang, Jeonghun, Kwon, Soonmok, Lee, Joonseok, Kim, Byung-Hak
Traditional approaches -- such as Win Probability Added (WPA)-based ranking or computer vision-driven event detection -- can identify scoring plays but often miss strategic depth, momentum shifts, and storyline progression. Manual curation remains the gold standard but is resource-intensive and not scalable. We introduce DIAMOND, an LLM-driven agent for context-aware baseball highlight summarization that integrates structured sports analytics with natural language reasoning. DIAMOND leverages sabermetric features -- Win Expectancy, WPA, and Leverage Index -- to quantify play importance, while an LLM module enhances selection based on contextual narrative value. This hybrid approach ensures both quantitative rigor and qualitative richness, surpassing the limitations of purely statistical or vision-based systems. Evaluated on five diverse Korean Baseball Organization League games, DIAMOND improves F1-score from 42.9% (WPA-only) to 84.8%, outperforming both commercial and statistical baselines. Though limited in scale, our results highlight the potential of modular, interpretable agent-based frameworks for event-level summarization in sports and beyond.
- Leisure & Entertainment > Sports > Baseball (1.00)
- Leisure & Entertainment > Games (0.68)
Inferring Event Descriptions from Time Series with Language Models
Tan, Mingtian, Merrill, Mike A., Gottesman, Zack, Althoff, Tim, Evans, David, Hartvigsen, Tom
Time series data measure how environments change over time and drive decision-making in critical domains like finance and healthcare. When analyzing time series, we often seek to understand the underlying events occurring in the measured environment. For example, one might ask: What caused a sharp drop in the stock price? Events are often described with natural language, so we conduct the first study of whether Large Language Models (LLMs) can infer natural language events from time series. We curate a new benchmark featuring win probabilities collected from 4,200 basketball and American football games, featuring 1.7M timesteps with real value data and corresponding natural language events. Building on the recent wave of using LLMs on time series, we evaluate 16 LLMs and find that they demonstrate promising abilities to infer events from time series data. The open-weights DeepSeek-R1 32B model outperforms proprietary models like GPT-4o. Despite this impressive initial performance, we also find clear avenues to improve recent models, as we identify failures when altering the provided context, event sequence lengths, and evaluation strategy. (All resources needed to reproduce our work are available: https://github.com/BennyTMT/GAMETime)
- North America > United States > Virginia (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Leisure & Entertainment > Sports > Football (1.00)
- Leisure & Entertainment > Sports > Basketball (1.00)
- Health & Medicine (1.00)
- (2 more...)
Natural Language Reinforcement Learning
Feng, Xidong, Wan, Ziyu, Fu, Haotian, Liu, Bo, Yang, Mengyue, Koushik, Girish A., Hu, Zhiyuan, Wen, Ying, Wang, Jun
Reinforcement Learning (RL) mathematically formulates decision-making with Markov Decision Process (MDP). With MDPs, researchers have achieved remarkable breakthroughs across various domains, including games, robotics, and language models. This paper seeks a new possibility, Natural Language Reinforcement Learning (NLRL), by extending traditional MDP to natural language-based representation space. Specifically, NLRL innovatively redefines RL principles, including task objectives, policy, value function, Bellman equation, and policy iteration, into their language counterparts. With recent advancements in large language models (LLMs), NLRL can be practically implemented to achieve RL-like policy and value improvement by either pure prompting or gradient-based training. Experiments over Maze, Breakthrough, and Tic-Tac-Toe games demonstrate the effectiveness, efficiency, and interpretability of the NLRL framework among diverse use cases. Our code will be released at https://github.com/waterhorse1/Natural-language-RL.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)